|
In theoretical computer science, random projection is a graphical technique used to reduce the dimensionality of a set of points which lie in Euclidean space. Random projection methods are powerful methods known for their simplicity and less erroneous output compared with other methods. According to experimental results, random projection preserve distances well, but empirical results are sparse. ==Dimensionality reduction== (詳細はPrincipal Component Analysis, linear discriminant analysis, canonical correlation analysis, Discrete Cosine transform Method, Gauss Method, random Projection etc. The random projection module implements a simple and computationally efficient way to reduce the dimensionality of data by trading a controlled amount of accuracy for faster processing times and smaller model sizes. This module construct two types of random matrix: Gaussian random matrix and sparse random matrix. The dimensions and distribution of random projections matrices are controlled so as to preserve the pairwise distances between any two samples of the dataset. It is a general data set reduction method in which a higher dimensional data a projected on lower dimensional subspace so that we can get a lower size subspace maintaining the relative distances of the elements. The main idea behind random projection is Johnson-Lindenstrauss lemma〔. 〕 in which it is stated that if points in a vector space are in sufficient high dimension and we project them on a suitably high dimensional space then the distances between points remains almost preserved. In random projection, the original d-dimensional data is projected to a k-dimensional (k << d) subspace through the origin, using a random matrix R whose columns have unit lengths. Using matrix notation where is the original set of N d-dimensional observations, is the projection of the data onto a lower k-dimensional subspace.Random projection is computationally very simple: forming the random matrix "R" and projecting the data matrix X into K dimensions of order , and if the data matrix X is sparse with about c nonzero entries per column the complexity is of order . The random matrix R can be generated using Gaussian distribution like this: The first row is a random unit vector uniformly chosen from . The second row is a random unit vector from the space orthogonal to the first row, the third row is a random unit vector from the space orthogonal to the first two rows, and so on. In this way of choosing R, the following properties can be satisfied: * Spherical symmetry: For any orthogonal matrix , RA and R have the same distribution. * Orthogonality: The rows of R are orthogonal to each other. * Normality: The rows of R are unit-length vectors. Achlioptas shows that the Gaussian distribution can be replaced by a much simpler distribution such as : This is efficient for database applications because the computations can be performed using integer arithmetic. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Random projection」の詳細全文を読む スポンサード リンク
|